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Method and apparatus for compressed data St rage and 

R trieval 

This invention relates to a method and apparatus for 
compressing and decompressing data. Embodiments of the 
invention are particularly useful in computer graphics systems 
and in particular in a computer graphics systems of the type 
which are useful in the field of 3D computer graphics and for 
applying detail to otherwise smooth surfaces through use of a 

*bump mapping' algorithms, such as those which store per-pixel 

surface normals in a texture . 

The concept of bump mapping was introduced in ^Simulation 
of Wrinkled Surfaces" (SIGGRAPH 1978, pp286-292) by Blinn. 
The computed shading of surfaces, which is typically done 
using a function of the incoming directions of light rays and 
a surface normal, i.e. a vector perpendicular to the surface 
at a given point, gives important clues as to the orientation 
and also roughness of that surface. Blinn's bump mapping gave 
otherwise mathematically smooth surfaces the appearance of 
roughness {or bumps) due to changes in the shading caused by 
altering the computed surface normal on a per-pixel basis. The 
method uses texture mapping to obtain a perturbation vector to 
modify a surface's interpolated-per-pixel normal. 

In ^Efficient Bump Mapping Hardware", (SIGGRAPH 1997, pp 
303-306, and US Patent 5,949,424) Peercy et al devised a more 
efficient method that directly stored ^perturbed' normal 
vectors in the texture data* These normals were defined 
relative to a localized tangent coordinate system- Bach light 
vector had to be expressed in coordinates relative to the 
local tangent space coordinate system. 

Because both the size of textures and the memory 
bandwidth consumed during texturing are important factors in 
computer graphics, European Patent BP 1004094 describes a 
process to reduce the storage costs of the surface normal from 
three coordinate values (XYZ) r to just two values thus saving 
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storage space and texturing bandwidth. This method takes 
advantage of the fact that the surface normals are unit 
vectors defined in the local surface coordinate space. As 
shown in Figure la, the unit normals are primarily restricted 
to lie in a single hemisphere. 

As an alternative to the local tangent space system, the 
surface normal direction can be defined in the object's local 
coordinate space. Although this has the disadvantage that it 
is difficult to reuse portions of the bump texture for 
different areas of objects, it has the advantage that it is 
cheaper to compute the interpolated lighting vectors and that 
there is no need to store per-vertex local tangent coordinate 
systems in the model. For example, the technique in WQ9S2726B 
starts with this approach but then uses vector quantisation to 
make bump map shading fast. 

with the local coordinate space method, one can note that 
the surface normal directions are now arbitrarily distributed 
in all directions across the surface of a sphere of unit 
radius (see figure lb) , unlike the local tangent space system 
where they are generally spread over one hemisphere. Although 
the method of surf ace normal compression described in SP 
1004094 can be extended by using an additional bit to choose 
between the hemispheres, this is not ideal, as many of the 
possible data encoding patterns are wasted. 

Although not intended for storage in bump map textures, 
^Geometry Compression" ♦ SIGGRAPH 1995, pp 13-20 by Deering 
describes a method for compressing a 3D unit vector into 18 
bits by identifying six regions in every octant of the unit- 
radius sphere. Uhf ortunately, 18 bits is an inconvenient size 
for texture storage in a computer-texturing device where the 
preferred size is typically 8 or 16 bits. Although it may be 
possible to reduce some of the precision of this method so 
that it does fit into, say, 16 bits, the method requires 
numerous tests as well as fairly expensive trigonometric 
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functions- Because a contemporary 3D texturing system needs to 
be able to compute in the order of a billion texturing 
operations per second, it is important that these 
decompression operations are relatively cheap. 

Preferred embodiments of the present invention provide a 
method and apparatus for storing 3D unit vectors in a form 
that is both optimised for storage in a texture, and for ease 
of decompression in the texturing and shading engine of a 
computer 3D graphics system. They are capable of supporting 
both the local tangent space and the local coordinate space 
methods of representing surface normals. Finally, the 
preferred methods make more efficient use of the 
representative bits than that presented in EP X004094. 

The invention is defined with more precision in the 
appended claims to which reference should now be made. 

Embodiments of the invention will now be described in 
detail, by way of example, with reference to the attached 
figures in which: 

Figure la illustrates the range of unit normals needed 
for the tangent space bump mapping method; 

Figure lb shows the larger range required for the local 
coordinate space method; 

Figure 2c shows the range compressed onto a regular 

. octahedron. 

Figure 2 shows the preferred assignment of bits to a 16 
bit encoding of a unit vector in an embodiment of the 
invention; 

Figure 3 illustrates decompression method and apparatus 
embodying the invention; 

Figure 4 illustrates details of method and apparatus for 
the step in figure 3 of computing the reciprocal of the vector 
length; 
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Figure 5 illustrates the distribution of possible normal 
locations which can be represented in an embodiment of the 
present invention using two 3 -bit values; 

Figure € illustrates the distribution of possible normal 
locations for a known method of data compression and retrieval 
using an approximately equal number of storage bits to that 
used for the embodiment of the invention whose distribution is 
illustrated in figure 5; 

Fxgizre 7 illustrates the distribution of figure 5 for 
just one octant pair of the sphere together with the 
corresponding grid of 3 -bit values; and 

Figure 8 illustrates a compression apparatus for 
converting a 3D vector to an * equivalent' packed form. 

One embodiment of the invention has two main aspects* 
Firstly it is able to represent 3D vectors, chosen from a set 
of points on a unit radius sphere/ in a compressed binary 
encoded format suitable for storage in computer memory. 
Secondly it is able to convert the compressed format back into 
3D unit vectors. As the clearest way of describing the 
embodiment of the invention is to illustrate the decompression 
process, this will be the approach taken- 

The spherical surface of possible unit vectors (e.g* 
Figure lb) is ^collapsed' into a regular octahedron (figure 
lc) . This is divided into octants and then those are paired 
to form four x octant pairs' . In the preferred embodiment, 
these pairs are chosen so that all vectors in an * octant pair' 
have the same sign for their X components and similarly the 
same sign for their Y components. The sign of the Z component 
distinguishes which octant of the pair the vector is in. An 
encoded value thus identifies which * octant pair' the vector 
lies in and then two values, U and V, are used to locate the 
vector inside the 1 octant pair' region. These U and V values 
can be considered to be an encoding of values in the range of 
[O-JLI . 
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In the preferred embodiment, each 3D unit vector is 
encoded as a 16 -bit value, as shown in figure 2. Bach value 
consists of 3 fields; the 2-bit ^octant pair identifier' 10, a 
7 -bit U parameter 11 , and a 7 -bit V parameter 12. Other 
choices of numbers of bits for the U and V parameters can be 
selected* 

Figure 3 showe an overview of the decompression 
apparatus. The * octant pair identif ier' , 10 , is supplied to a 
decode unit, 20. This interprets the input values to produce 
a pair of numerical sign flags that will be applied to X and Y 
components for the decoded vector. The ^Add and Test 
Magnitude' unit, 21, adds together the U and V values, 11 and 
12, and compares the sum with 127 (which is equivalent to a 
value of logical value of % 1.Q' and is half the maximum 
possible value of the sum of 7 -bit U and V values) . It outputs 
both the result of the sum and a flag based on the result to 
indicate whether the sum is above 127, The flag is then used 
to determine in which of the two octants in the * octant pair' 
the vector is located. 

Assignment unit 22 takes the pair of sign bits from 20 
and the magnitude comparison result from unit 21, and combines 
them with the original U and v values, ll and 12, to produce a 
vector, {X' Y' Z'}. This vector is in the same direction as 
the normal vector but is not of unit length. Unit 23 computes 
the reciprocal of the length of the vector {X' Y' Z'}, and 
passes this to a scaling unit, 24. This then scales the vector 
using the reciprocal to produce a unit vector result, 25. 

The internal operation of these various units will now be 
described using a Olike pseudo-code notation. 

Unit 20, produces two sign flags, Xsign and Ysign, based 
on the ^octant pair' identifier, 10, OPJ. This is trivial 
operation in hardware and is described by the following code: 
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tdefine lS_POS (o) 
#define IS_NEG (1) 

switch (OPI) 
{ 

case M 0Q": 

Xeign = IS_POS; 

Ysign ■ IS_POS; 

breaks- 
case *Q1" : 

Xsign = IS_POS; 

Ysign - ISJSEG; 

break; 

case ^10" : 

Xsign - IS_NEG; 
Ysign = IS_POS; 
break; 

case "11" : 

Xsign = IS_iaEG; 
Ysign = XS_NEG? 
break ; 



The xsign and Ysign flags are used to indicate whether the 
values X and Y will be positive or negative. 



Unit 21, produces the sum of U & V and a single bit flag, 
to identify to which octant of a pair of octants the data 
belongs. This can be described as: 



UVSum = u+v; 
If tUVSura < 128) 

{ 

Octant = 0; 

else 

{ 

Octant = 1; 

Those ^skilled in the art will appreciate that in hardware, 
given the range of the input parameters, i.e. [0..127] in th« 
preferred embodiment , the comparison and assignment amount t< 
selecting the top bit of the sum of the u and v values range 
Of [0- .254] - 
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Unit 22 produce3 the initial, non-unit- length vector as 
follows : 

If (octant=»=0) 
{ 

X' - U; 
Y' - V; 

Else 

^ X f = 127 - V; /*Note swap of a and V* 
Y' = 127 - U; 

} 

Z' = 127 - (UVSum) ; 

If (Xsign == IS NEG) 
{ 

X' = -X' 7 

if (Ysign == IS_NEG) 
{ 

Y' = -Y' ; 

} 

In the preferred embodiment , the X' and Y' values use 
signed-magnitude format (rather than two complement) so that 
the negation and the subsequent computation of the square of 
the length of the [X* ,Y' ,Z'l vector is cheaper. 

Unit 23 computes the reciprocal of the length of the 
initial vector as a pseudo floating-point binary number. It 
initially computes the square of the length of the vector by 
summing the squares of the components, i.e. 

LengthSQ = X'*X' + y**Y' + Z'*Z' f 

it will be appreciated that, due to the range of input 
values and the calculations performed, the possible range of 
squared lengths is limited. In the preferred embodiment, this 
range is 5377 to 16129 Inclusive and thus can be represented 
with 14 bits. 

The reciprocal of the square root of this squared length 
is then computed in a pseudo- floating point format, using any 
method known in the art. With reference to figure 4, in the 
preferred embodiment, this calculation can be done using a 



44886 



7 



normalizing shifter, 50, to shift the input an even number of 
bits to the left of which the 10 top most bits will be 
selected. Note that the most significant '1' bit is in either 
bit of the two most significant locations. The 10 bit result, 
51, effectively represents a fixed point number in the range 

[256,1023], and is then used to access a lookup table, 52. 
This lookup table returns an libit fixed point result in the 

range (1023" 0 * s , 1/16) corresponding to the reciprocal of the 

square root of the input to 52 multiplied by a suitable power 

of two. 

This result re-combined with the normalising shift amount 
divided by two, S2 , added to an additional shift, 54, then 
constitutes the reciprocal square root of the original squared 
sum in a pseudo floating point form. The additional shift 
value, in the example embodiment, is a. The value is chosen 
to allow for the magnitude of the results, the number of 
significant bits to be output from the look up table, and the 
number of bits of precision required for the final normalised 
vector - 

Finally, Unit 24 just multiplies the pseudo floating 
point value (i.e. corresponding to a multiply followed by 
shifts and truncates) by each of the X', V , and Z' components 
to obtain the normalized result representing signed 8 bit 
fixed point values in the range [-1,1] with 7 fractional bits. 

For illustrative purposes, figure 5 displays the 
distribution of normal locations for an example embodiment 
where U and V are only assigned 3 bits each. As a comparison, 
figure 6 shows the distribution of points for the adapted 
version of the method described previously in European Patent 
1004094, using an approximately equivalent number of storage 
bits. As can be seen, the storage is not as even, nor as 
dense. 
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Figure 7 shows the distribution for just one octant pair 
of the sphere with the corresponding grid of U and V values . 

The opposite of the decompression process, i.e. 
compression, is essentially the described process run in 
reverse- This will be described with reference to figure 8. 
A (preferably) unit vector, V, 10O, is analysed, 101, and the 
signs o£ the X, Y, Z components extracted to determine which 
octant the unit vector lies in. The signs of the X and y 
components are used to construct the OP1 encoding, 102* A 
scaling factor for the vector is computed, 103, so that 
VI £Z] =127 - VI Cxi - VI [Y] . The scaling factor is applied to 
the X and Y values, 104. Given the sign of the Z component. 
The U and V components are then computed from the scaled X and 
Y values, 105. ' 

Due to rounding/truncation approximations in the 
decompression process, it is possible that a 'closer' match 
can be found by trying some of the * neighbouring' encodings. 
This procedure is described by the following *C code: 

void EncodeNorroal (const float Normt3] , 
int *pU, int *pV, int *pOPI) 

{ 

int OPI; 
float x, y, z; 

float BestBrror, AngleError; 

int InitialU, InitialV, BestU, BestV, Ui, Vi; 

float scale; 

float PackadNorm [3 ] ; 

if <Norm[l] < 0.0) 

{ 

OPI m 0x1; 

} 

else 

{ 

OPI = 0x0; 
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} 

if (Norm [0] < o.O) 
{ 

OPI |= 0x2; 

} 

// Return the OPI encoding. 

*pOPI = OPI; 

x = fabs (Norm[0] ) ; 

y s fabs (Narmtll ) ; 

2 ■ Norm [2] ; 

scale = 127 / (fabs(z) + x + y) ; 
x *= scale ; 
y *- scale; 
/* 

// Compute initial integer U & V values 

// (talcing 2 sign into account) 

*/ 

if (z < 0) 
{ 

InitialU - (int) floor(127 - y) ; 
initialV - (int) floor (127 - x) ; 

} 

else 

{ 

InitialU - (int) floor (x); 
InitialV . (int) floor (y); 

} 

/* 

// Try a neighbourhood of U and V values. 
*f 

BestError » FLT_MAX ; 

for(Ui » MAX ( InitialU- 1, 0} ; 

Ui < MIN( InitialU + 3, 127); Ui+f) 

( 
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} 



for{Vi • MAX(Initialv-l, 0) ; 
Vi < MIN(InitialV + 3, 127); Vi++) 
{ 

// Confute normal from the trial U,V and OPI 
DecodeNorm (Ui , Vi , OPI , PackedNorm) ; 
AngleError = MeasureAngle (Norm, PackedNorm) / 
if (AngleError < BestError) 

{ 

BestError = AngleError; 
BestU = Ui; 
BestV = Vi; 

} 

}/*end for Vi*/ 
}/*end for Ui*/ 

/* 

// Return the results 
*/ 

*pU = BestU; 
*pV = BestV ; 



The ^MeasureAngle' routine can be computed either toy computing 
the angle exactly, i.e./ 

double MeasureAngle (const float Vecl [3], 

const float Vec2 [3] ) 

{ 

return acos (DP (Vecl, Vec2)/ sqrt (DP (Vecl, Vecl) * 
DP(Vec2, Vec2) )) ; 

} 



-or more cheaply as this simpler metric-. 

double MeasureAngle (const float Vecl [3] , 
const float Vec2[3]) 

{ 
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return fabs(1.0 - DP(Vecl, Vec2)/ sqrt (DP (Vecl, Vecl) * 
DP(Vec2, Vec2) ) ) ; 

} 

The DP function computes the usual dot product, i.e.-, 

double DP{const float A[3] , const float B[3]> 
{ 

return AtOl*B [0] + A[ll*BtU + A[2]*B[2],- 

} 

It should he noted that this technique is also useful for 
applications other than bump mapping which require reduced 
cost storage of unit vectors. The compression of vertex 
geometry , as described by Deering, is one such application. 
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